Using the classical linear predictor
What we don’t see:
The model uses a Gaussian family with an identity link
The link function maps the linear predictor \(\eta = \beta_0 + \beta_1 \cdot \text{age}\) onto the scale of the response variable \(Y\)
Using the classical linear predictor
Predicted values (IN VIOLA) can fall outside the valid range for accuracy [0,1]
In the first example, the identity link was appropriate
Here, the identity link is not appropriate
Predicted values (IN VIOLA) fall within the valid range for accuracy [0,1]
link="logit" makes sure that y spans from 0 and 1
Adding a new main effect: group (IN BLU)
group = 0)group = 1)link="identity"link="identity"A positive interaction emerges
Call:
glm(formula = accuracy ~ age * group, data = d)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.529062 0.016916 31.28 <2e-16 ***
age 0.052541 0.002103 24.99 <2e-16 ***
group1 -0.566758 0.023871 -23.74 <2e-16 ***
age:group1 0.059790 0.002967 20.15 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.0030859)
Null deviance: 15.9775 on 999 degrees of freedom
Residual deviance: 3.0736 on 996 degrees of freedom
AIC: -2937
Number of Fisher Scoring iterations: 2
link="logit"link="logit"A negative interaction emerges
fit = glm(accuracy ~ age*group, data=d, family=binomial(link="logit"), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "logit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.26482 0.32430 -28.568 < 2e-16 ***
age 1.69491 0.04842 35.006 < 2e-16 ***
group1 1.55052 0.36909 4.201 2.66e-05 ***
age:group1 -0.40870 0.05457 -7.490 6.90e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8812.26 on 999 degrees of freedom
Residual deviance: 957.55 on 996 degrees of freedom
AIC: 3033
Number of Fisher Scoring iterations: 5
k = 50
N = 1000
group = rbinom(N,1,.5)
age = runif(N,6,10)
eta = -6+1*age-1*group
probs = mafc.probit(.m = 2)$linkinv(eta)
accuracy = rbinom(n = N, size = k, prob = probs) / k
d = data.frame(
age = age,
age_c = age - mean(age),
accuracy = accuracy,
group = as.factor(group)
)
ggplot(d, aes(x = age, y = accuracy, color = group)) +
geom_point() +
scale_x_continuous(limits = c(6, 10), breaks = seq(6, 10, 1))No interaction was simulated
Both models are detecting an interaction that does not exist
Using 2 alternatives forced-choice probit link,
we account for the 50% chance level in a TRUE/FALSE task.
link="mafc.probit(2)"No interaction emerges, in line with how the data were generated
fit = glm(accuracy ~ age*group, data=d, family=binomial(link=mafc.probit(.m=2)), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = mafc.probit(.m = 2)),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.911744 0.209295 -28.246 < 2e-16 ***
age 0.987424 0.029927 32.994 < 2e-16 ***
group1 -1.074050 0.266644 -4.028 5.62e-05 ***
age:group1 0.006767 0.036919 0.183 0.855
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8429.14 on 999 degrees of freedom
Residual deviance: 757.35 on 996 degrees of freedom
AIC: 2783.1
Number of Fisher Scoring iterations: 6
link="identity"Equal intervals on X correspond to equal intervals on Y
In our example: \(\eta = \beta_0 + \beta_1 \cdot age\)
link="logit"Equal intervals on X correspond to equal ratios (NOT equal intervals) on Y
link=mafc.probit(2)Equal intervals on X do NOT correspond to equal intervals on Y
Building a model means approximating the data-generating process (never observed directly in real data)
That requires key choices:
Tip
Choose an appropriate distribution (family)
so predicted values remain within the outcome’s valid range
Tip
Choose an appropriate link function
because the link defines the scale of effects and the wrong link can create spurious interactions
We are conducting a systematic review to assess how often inappropriate link functions are used when testing interactions in psychological research
So far, this appears to happen quite often
Data simulation, code and presentation are available on GitHub: sitalaura/link-functions
Questions and feedbacks: laura.sita@studenti.unipd.it
Domingue, B. W., Kanopka, K., Trejo, S., Rhemtulla, M., & Tucker-Drob, E. M. (2024). Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychological methods, 29(6), 1164.
Hardwicke, T. E., Thibault, R. T., Clarke, B., Moodie, N., Crüwell, S., Schiavone, S. R., Handcock, S. A., Nghiem, K. A., Mody, F., Eerola, T., et al. (2024). Prevalence of transparent research practices in psychology: A cross-sectional study of empirical articles published in 2022. Advances in Methods and Practices in Psychological Science, 7 (4), 25152459241283477.
Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1), 156.
Special thanks to
link="probit"link="probit"A negative interaction emerges
fit = glm(accuracy ~ age*group, data=d, family=binomial(link="probit"), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "probit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.21133 0.03018 73.280 < 2e-16 ***
age 0.81113 0.02295 35.337 < 2e-16 ***
group1 -0.79152 0.03400 -23.279 < 2e-16 ***
age:group1 -0.11299 0.02637 -4.285 1.83e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8812.26 on 999 degrees of freedom
Residual deviance: 853.32 on 996 degrees of freedom
AIC: 2928.7
Number of Fisher Scoring iterations: 6
Cognitive Science Arena 2026